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1.  Introduction 

Kuck  and  Sameh  [9]  noted  that  completion  of  a  Givens  or  Householder 
eigenvalue  algorithm  on  a  parallel  machine  may  be  done  by  a  parallel 
implementation  of  a  bisection  algorithm.  In  this  paper  we  investigate  the 
considerations  involved  in  the  implementation  of  this  idea  on  an  MIMD  (multiple 
instruction  stream  -  multiple  data  stream)  machine  with  a  smaller  number  of 
processors  available  than  eigenvalues  to  be  found. 
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There  are  three  major  alternatives  to  consider: 

1.  Convert  the  serial  Sturm  sequence  code  to  a  parallel  algorithm. 

2.  For  each  interval  compute  more  than  one  division  point  in  parallel. 

3.  Process  more  than  one  interval  at  a  time. 

We  will   show  that  the  last   alternative  has  certain  advantages  for  MIMD 
machines. 


2.  Theoretical  Background 

Let  A  be  a  real  tridiagonal  matrix  with  major  diagonal  elements  A,  j  =  7,-, 
for  i  =  l,...,n,  and  off-diagonal  elements  A,_i ,  =  A,  ,_i  =  3,,  for  i  =  2,...,n, 
where  n  is  the  order  of  A.  For  any  p.,  one  may  calculate  the  determinant  of 
A  -  fi,  det{A  -  p,),  by  minors  from  the  sequence 
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fM  =  71-^-. 

fM  =  ("y;-M')/i-i(p-)-p?/i-2(M'),      i  =  2,...,n, 

where  t/«f(A-jjL)  = /„(p,). 

Givens  [4,5]  demonstrated  that,  if  we  modify  this  sequence  slightly  to  avoid 
the  case  where  two  consecutive  terms  are  zero,  we  obtain  a  count  of  the 
eigenvalues  of  A  ^  p.  as  the  number  of  agreements  in  sign  between  successive 
terms,  counting  zero  as  positive.  It  then  follows  that  one  can  locate  any 
eigenvalue  to  any  required  accuracy  by  successively  bisecting  an  interval  known  to 
contain  the  desired  eigenvalues. 

Givens'  method  is  nimierically  very  stable.  The  error  in  the  eigenvalues  is 
no  worse  than  approximately  15  times  the  truncation  error  2~',  for  /-bit  mantissa 
binary  machines,  independent  of  the  order  n  of  A .  However,  direct  calculation  of 
the  sequence  /,(ji.)  can  easily  lead  to  underflow  and  overflow,  so  periodic  range 
checks  and  rescalings  are  required.  Barth,  Martin  and  Wilkinson  [1]  avoid  the 
need  to  rescale  by  using  the  sequence 

gM=fiMfi-My     1  =  1,. -n, 
alternatively  given  as 

sM  =  7i  -  M- 
SM  =  7.  -  M-  -  &f/gi-My    1  =  1,. ..n, 

with  zeros  replaced  by  small  non-zero  quantities.  Counting  positive  g,  is  the  same 
as  covmting  sign  agreements  of  /,,  which  yields  the  count  of  the  number  of 
eigenvalues  of  Asp,.  Equivalently,  counting  negative  ^,  is  the  same  as  counting 
sign  disagreements  of/,,  which  yields  the  count  of  eigenvalues  of  A<ji,.  As  well 
as  avoiding  rescaling,  the  use  of  gj  requires  one  division  in  place  of  two 
multiplications. 

As  Wilkinson  [10]  notes,  when  one  is  sufficiently  close  to  an  isolated 
eigenvalue,  one  can  accelerate  the  convergence  by  switching  from  bisection  to  an 
interpolation  scheme.  This  is  done  by  Huang  [8]  in  his  parallel  algorithm.  When 
eigenvalues  are  tightly  clustered,  simple  interpolation  is  actually  slower  than 
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bisection  [10].  However,  by  interpolating  on  a  suitable  root  of  the  determinant  of 
A-|x,  one  can  achieve  significant  improvements  in  efficiency  over  bisection,  even 
in  the  clustered  case  [2].  For  such  interpolations,  one  is  obliged  to  calculate  the 
sequence  of /,(jji),  rather  than  g,(|x). 


3.  Parallel  Algorithms  for  Bisection 

On  an  SIMD  (single  instruction  stream  -  multiple  data  stream)  machine,  we 
have  available  some  number  of  processors  which  will  use  a  common  instruction 
stream  in  synchrony.  At  any  given  time  some  of  the  processors  may  be  disabled, 
but  those  that  are  working  must  perform  the  same  tasks  at  the  same  time,  albeit 
on  different  data.  On  an  MIMD  (multiple  instruction  stream  -  multiple  data 
stream)  machine,  the  processors  may  work  independently  of  one  another.  TTiis 
allows  considerably  more  freedom  in  the  design  of  algorithms.  All  the  processors 
might  work  on  the  same  task  at  the  same  time  as  in  an  SIMD  machine  or  each 
processor  might  be  assigned  to  a  separate  task.  Synchrony  among  processors  is 
determined  by  the  need  for  common  access  to  the  same  data  structures,  not  by 
the  limitations  of  the  machine.  We  assume  an  NflMD  machine. 

The  Sturm  sequences  are  recurrences.  In  Givens'  original  formulation  of 
/,(p,),  no  divisions  are  required.  In  the  Barth,  Martin,  Wilkinson  formtilation, 
rescalings  are  avoided,  but  a  division  is  introduced.  In  converting  the  sequence 
calculation  to  a  parallel  algorithm,  we  need  not  retain  all  the  intermediate  terms. 
We  are  concerned  only  with  the  count  of  eigenvalues  and,  possibly,  with  the  final 
determinant.  Thus  we  have  one  or  two  output  values  to  compute  from  2n  input 
values,  i.e.  from  n  values  of  "v,,  n-1  values  of  p,,  and  one  value  of  p..  If  we 
have  m  processors,  then,  as  in  [7],  we  need  at  least 

/n 

time-steps  to  compute  such  values,  where  fx]  is  the  next  integer  above  x.  When 
m  s  2n,  the  lower  bound  is  fto^(2«)]  time-steps,  while,  when  n:sm<2n,  the 
lower  bound  is  approximately  fZo^(2n)l  +2n/m  time-steps,  and  when  m  <  rty  the 
lower  bound  is  approximately  \log(my\+2n/m  time-steps.  This  last  case,  which  is 


the  one  of  interest  in  this  paper,  gives  an  upper  bound  on  the  speedup  of 
approximately 

2n 


\logim)]  +  ^ 


In  practice,  no  numerically  stable  algorithm  is  known  that  can  achieve  these  lower 
boimds. 

The  difficulty  of  reorganizing  the  Sturm  sequence  calculation  makes  it  more 
tempting  to  compute  each  of  the  sequences  sequentially  and  to  compute  m  of 
them  in  parallel.  If  we  follow  Huang  [8]  and  use  that  approach  to  compute  m 
division  points  of  each  interval,  we  reduce  the  interval  size  by  a  factor  of  m+1  in 
each  time-step,  instead  of  by  a  factor  of  2.  If  we  remain  with  this  multisection 
approach  until  convergence,  we  can  expect  each  eigenvalue  to  take  approximately 

logim+\) 

time-steps,  where  €  is  the  absolute  error  required,  and  |A|  is  a  suitable  norm  of 
the  matrix.  This  gives  a  tog  (m  +  1)  speedup.  This  scheme  can  be  implemented  on 
appropriate  SIMD  machines. 

On  MIMD  machines,  we  have  the  alternative  of  computing  the  eigenvalues 
in  parallel,  doing  each  one  by  bisection.  The  total  number  of  time-steps  needed 
to  compute  k  isolated  eigenvalues  is  approximately 

\k/m\  (/og(^4|)-/og(€)), 

assuming  m  «  k.  As  m  approaches  k  we  will  have  idle  processors  due  to  the 
uneven  distribution  of  work  among  them.  If  we  could  avoid  that  case  and  could 
somehow  start  each  processor  working  on  an  a  priori  known  isolated  set  of 
eigenvalues,  we  could  obtain  a  speedup  of  m,  which  is  better  than  we  could  obtain 
by  doing  the  Sturm  sequence  in  parallel  or  doing  Huang's  multisection.  We  have 
the  further  advantage  of  being  able  to  apply  acceleration  to  the  final  convergence 
of  any  subset  of  the  eigenvalues  while  continuing  to  do  bisection  on  others. 
Naturally,  we  cannot  totally  avoid  the  uneven  distribution  of  work.  We  reduce 
the  effect  of  the  uneven  distribution  by  keeping  the  number,  m,  of  processors 
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small  compared  to  the  number  of  eigenvalues  to  be  computed.   In  that  case  the 
work  load  will  average  out. 


4.  Practical  Implementation 

Assume  an  MIMD  machine  with  m  processors,  where  m  is  smaller  than  the 
number  of  eigenvalues  to  be  computed.  If  m  were  larger,  we  would  have  no 
work  for  the  excess  processors  in  computing  eigenvalues  in  parallel,  and  would 
either  have  to  put  them  to  work  in  groups  on  individual  eigenvalues  in,  say, 
Huang's  multisection  technique  or  let  them  be  idle. 

The  Barth,  Martin,  WUkinson  algorithm,  BISECT,  as  modified  in  [2]  makes 
a  suitable  base  for  the  creation  of  a  parallel  algorithm  which  computes  multiple 
eigenvalues  in  parallel.  As  each  eigenvalue  is  computed,  information  is  stored  on 
upper  and  lower  bounds  for  various  eigenvalues  that  are  encountered.  To  avoid 
unnecessary  interlocks,  we  start  with  only  one  processor  active  for  the  entire 
interval,  and  we  use  the  bound  information  to  dedde  when  to  invoke  additional 
processors. 

The  technique  consists  of  having  each  invocation  of  the  bisection  routine 
attempt  to  generate  additional  invocations  of  itself  on  finding  an  interval  with 
more  than  one  eigenvalue  and  with  a  division  point  among  the  eigenvalues.  The 
alternative  would  be  to  start  many  processors  in  parallel  with  a  priori  assigned 
intervals,  using  a  queue  management  scheme  to  assign  new  intervals  to  processors 
as  they  complete  their  tasks.  Grit  and  McGraw  [6]  have  demonstrated  that  the 
dynamic  spawning  of  processes  for  divide  and  conquer  on  an  MIMD  machine 
incurs  little  overhead  and  avoids  the  complexity  of  code  needed  for  dynamic 
queue  management. 

Thus  we  use  dynamic  spawning  in  which  the  calling  routine  continues  to 
search  the  interval  above  the  division  point,  while  the  newly  invoked  routine 
searches  the  interval  below  the  division  point.  On  each  such  invocation,  a 
properly  interlocked  counter  is  increased  by  one.  On  completion  of  work  by  a 
processor,  the  same  counter  is  decreased  by  one.  Whichever  processor  finds  that 
it  is  the  last  to  sign  out  calls  the  routine  DONE  to  print  the  final  eigenvalues. 
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This  approach  avoids  the  need  for  any  one  processor  to  sit  idle  while  waiting  for 
the  others  to  sign  out,  since  no  invocation  of  BISECT  waits  for  any  other  to 
finish.  Thus  all  processors  are  devoted  to  solution  of  the  problem,  not  to  process 
management,  and  become  free  for  other  tasks  as  soon  as  possible. 

This  approach  also  avoids  the  need  to  have  a  priori  known  starting  points  for 
the  processors.  The  price  paid  for  this  flexibility  and  simplicity  is  that  no 
processor  is  applied  to  the  task  until  a  higher  level  processor  has  taken  at  least 
one  bisection  step.  This  reduces  the  speedup  to  less  than  m. 

We  can  analyse  the  effect  on  the  speedup  by  considering  two  extreme  cases. 
K  we  use  a  very  effective  accelerated  bisection,  then  we  may  isolate  each  duster 
eigenvalue  from  the  others  on  the  very  first  step  involving  that  cluster.  On  the 
other  hand,  we  may  have  a  bisection  without  any  acceleration  which  simply  tends 
to  partition  eigenvalue  clusters  into  equal  halves.  In  the  accelerated  case,  it  will 
take  m  time-steps  to  start  m  processors  working.  In  the  other  case,  \log(m)]+l 
time-steps  will  suffice  to  get  m  processors  working.  (We  will  use  "step"  to  mean 
a  computational  step  which  may  be  done  in  parallel  with  some  other  "step",  and 
"time-step"  to  mean  elapsed  real- world  time). 

Let  Nfoi  be  the  total  number  of  bisection  steps  required.  Let  k  be  the  number 
of  clusters  of  eigenvalues.  In  the  highly  accelerated  case,  an  extra  k  steps  will  be 
required  before  the  last  eigenvalue  cluster  is  approached.  Thus  the  expected 
number  of  steps,  N^ig,  spent  working  on  an  eigenvalue  cluster  by  itself  will  be 
given  by  the  following  approximation 

and  the  expected  speedup  will  be  approximately 


\k/m]Ngig  +  k 


When  clusters  consist  of  eigenvalues  which  can  be  separated  after  some  large 
number  of  steps,  N^ig  may  vary  from  cluster  to  cluster.  This  changes  the  speedup 
and  allows  effective  use  of  additional  processors. 
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In  general,  the  best  speedup  is  obtained  in  the  non-accelerated  case,  when  m 
processes  are  spawned  in  f/og(m)l  +  l  time-steps  involving  approximately  2m— 1 
of  the  Nu,,  steps,  and  when  the  remaining  steps  are  uniformly  distributed  among 
the  processors.  This  gives  a  speedup  of  approximately 


iN^-2m+l)/m  +  \log{m)]+l  ' 

This  is  a  better  speedup  than  in  the  accelerated  case.  This  would  seem  an 
argument  against  acceleration  in  a  parallel  implementation  of  bisection.  One 
must  recall,  however,  that  N,o,  varies  between  the  two  cases.  With  acceleration 
N^,  is  smaller  than  without  acceleration.  Thus  the  decision  must  be  based  on  the 
tradeoff  between  the  reduction  in  Nio,  provided  by  acceleration  and  the  higher 
overhead  in  starting  parallel  processes. 


5.  Test  Results 

The  method  described  above  was  implemented  using  an  MIMD  simulator 
[3].  This  simulator  provides  processors  with  private  memories  and  access  to  a 
common  memory,  part  of  which  may  be  interlocked  so  that  only  one  processor 
may  do  a  fetch-restore  cycle  at  one  time. 

The  algorithm  was  that  given  in  [2]  for  accelerated  bisection,  and  was  tested 
on  several  test  matrices.  The  following  is  a  typical  case. 

When  applied  to  a  21  by  21  tridiagonal  matrix,  W£  [10],  for  1,  2,  4,  8,  16 
and  21  processors,  the  code  produced  the  expected  eigenvalues  in  a  total  of  505 
steps  (sxmamed  over  all  processors)  to  a  precision  of  .4*10"^^  in  all  cases,  and 
the  eigenvalues  produced  were  independent  of  the  number  of  processors.  The 
elapsed  time,  speedup,  measured  and  expected  efficiencies  were: 


-8- 


laeasxired  expected 


processors 

time 

speedup   ei 

rficiency 

efficiency 

1 

1708336 

•  ^' 

2 

860608 

1.99 

99X 

98X-99X 

4 

466007 

3.67 

92% 

93X-99X 

8 

266385 

6.41 

80X 

83X-97X 

16 

166627 

10.25 

64X 

69X-91X 

21 

148467 

11.51 

55% 

53X-89X 

The  expected  efficiency  was  based  on  the  formulae  above  and  the  assumption 
of  sixteen  eigenvalue  clusters.  It  should  be  noted  that  each  of  the  clusters 
consists  of  one  or  two  eigenvalues  and  the  two-eigenvalue  clusters  all  eventuaUy 
split,  though  in  one  case  only  after  15  digits.  The  statistics  of  the  MIMD 
simulator  showed  that  the  loss  of  efficiency  was  almost  entirely  from  the 
variations  in  workload  for  the  processors  assigned  and  from  overhead  in  the 
initiation  of  new  invocations  of  BISECT,  not  from  interlock  delays.  At  these 
efficiencies,  reversion  to  an  unaccelerated  bisection  would  not  pay,  since  it  would 
more  than  double  the  number  of  cycles  at  this  high  precision. 
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6.  Appendix  -  Parallel  Variant  of  BISECT 

The  following  FORTRAN  code  uses  the  parallel  computer  simulator  RINSE 
[3]  to  perform  parallel  bisection.  The  special  entry  point,  CALLPE,  similar  to 
the  original  CREATE,  has  been  added  to  provide  an  immediate  return  when  no 
processor  is  available.  This  allows  the  code  to  nm  with  an  unknown,  or  even 
varying  number  of  processors,  without  deadlocks  due  to  misestimates  of  the 
number  of  procesors.  In  an  environment  where  the  available  number  of 
processors  is  fixed,  this  code  could  be  replaced  by  an  interlocked  counter  and  calls 
to  CREATE. 
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SUBROUTINE  BISECT ( GAMMA , BETA , BETASQ , M , M1 , M2 , EPS 1 , EPSMAC , 
♦  EPS2,Z,X,ESTL0, OUTER, ASYN) 
C 

C  ACCELERATED  STURM  BISECTION  OF  TRIDIAGONAL  MATRIX 

C  HERBERT  J.  BERNSTEIN,  MARCH  1982 
C 

c 

C  REVISED  NOV  1982-FEB  1983  FOR  USE  ON  CYBER  WITH  RINSE 

C  BLOCK  IF  CONSTRUCTS  REMOVED,  CALLS  TO  ALLOW  PARALLEL 

C  EXECUTION  INTRODUCED. 

C  H.J.B.  FEBRUARY  1983 

C 

C  INPUT  FORMAL  PARAMETERS 

C 

C  GAMMA   -  THE  MAIN  DIAGONAL  OF  THE  MATRIX 

C  BETA    -  THE  SUBDIAGONAL  OF  THE  MATRIX,  BETA(1)=0 

C  BETASQ  -  THE  SQUARES  OF  THE  SXTBDIAGONAL 

C  N      -  THE  ORDER  OF  THE  MATRIX 

C  M1,M2   -  THE  RANGE  OF  ORDINALS  OF  EIGENVALUES  DESIRED 

C  EPS1    -  TOLERANCE  OF  EIGENVALUES 

C  EPSMAC  -  SMALLEST  NUMBER  SUCH  THAT  1+EPSMAC  .GT.  1 

C  OUTER   -  .TRUE.  FOR  ORIGINAL  CALL,  .FALSE.  FOR  INNER  CALLS 

C 

C  OUTPUT  FORMAL  PARAMETERS 

C 

C  EPS2    -  ATTAINABLE  TOLERANCE  OF  EIGENVALUES 

C  Z       -  TOTAL  NUMBER  OP  STEPS  PERFORMED 

C  MUST  BE  ASYNCHRONOUS 

C  X       -  RESULTING  EIGENVALUES 

C  ESTLO   -  A  SCRATCH  ARRAY  HOLDING  LOWER  ESTIMATES 

C  ASYN    -  SCRATCH  ASYNCHRONOUS  LOCATION  FOR  INTERLOCK  USE 

C 

C  INTERNAL  VARIABLES 
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C  ALPHA   -  WEIGHT  OF  LOWER  ENDPOINT  XU  IN  INTERPOLATION 

C  BDHI    -  HIGH  BOUND  FOR  RESCALING 

C  BDHRCP  -  RECIPROCAL  OF  BDHI 

C  DELTA   -  HOW  CLOSELY  WE  MAY  APPROACH  ENDS  OF  INTERVAL 

C  DIFF    -  DIFFERENCE  OF  LOGS  OF  ROOTS  OF  DETERMINANTS 

C  EPSLOG  -  LOG  OF  EPSMAC  FOR  BOUNDS  ON  DIFF 

C  F       -  PREVIOUS  DETERMINANT  IN  CALCULATION  BY  MINORS 

C  FN      -  CURRENT  DETERMINANT  IN  CALCULATION  BY  MINORS 

C  FP      -  SCRATCH  IN  GOING  FROM  F  TO  FN 

C  FX1     -  LOG  OF  DETERMINANT  AT  INTERPOLATION  POINT  XI 

C  FXO     -  LOG  OF  DETERMINANT  AT  HIGH  ENDPOINT  XO 

C  FXU     -  LOG  OF  DETERMINANT  AT  LOW  ENDPOINT  XU 

C  H       -  SCRATCH  IN  FINDING  BOUNDS  ON  EIGENVALUES 

C  I       -  SCRATCH  FOR  LOOP  INDICES 

C  lA      -  EIGENVALUE  ORDINAL  BOUND 

C  lEXP    -  LOG  OF  SCALINGS  DONS 

C  IHI     -  COUNT  OF  EIGENVALUES  BELOW  XO 

C  II      -  SCRATCH  FOR  LOOP  INDICES 

C  ILO     -  COUNT  OF  EIGENVALUES  BELOW  XU 

C  K       -  ORDINAL  OF  EIGENVALUE  BEING  FOUND 

C  KK      -  SCRATCH  LOOP  INDEX  TO  COMPUTE  K 

C  LOSCAL  -  LOG  OF  LOW  BOUND  FOR  RESCALING 

C  MM1,MM2-  LOCAL  COPIES  OF  Ml ,  M2 

C  RELRCP  -  RECIPROCAL  OF  EPSMAC  (A  BIG  NUMBER) 

C  WIDTH   -  WIDTH  OF  TEST  INTERVAL 

C  XI      -  INTERPOLATION  POINT 

C  XLBD    -  LOWER  BOUND  FOR  RESCALING 

C  XLIM    -  MINIMUM  SIZE  OF  TEST  INTERVAL 

C  XMAX    -  UPPER  BOUND  FOR  EIGENVALUES 

C  XMIN    -  LOWER  BOUND  FOR  EIGENVALUES 

C  XNEAR   -  A  LOWER  BOUND  ON  RELATIVE  DISTANCE  FROM  XI  TO  XU.XO 

C  XO      -  UPPER  ENDPOINT  OF  TEST  INTERVAL 
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C  XO     -  LOWER  ENDPOINT  OF  TEST  INTERVAL 

C  XWIDTH  -  NEXT  INTERVAL  WIDTH 

C  ZBUSY   -  FLAG  FOR  OTHER  PROCESSORS  BUSY 

C 

C  STANDARD  FUNCTIONS  CALLED 

C 

C  ALOG    -  NATURAL  LOGARITHM 

C  AMAX1   -  MAXIMUM  VALUE 

C  AMIN1   -  MINUMUM  VALUE 

C  DABS    -  DOUBLE  PRECISION  ABSOLUTE  VALUE 

C  DEXP    -  DOUBLE  PRECISION  EXPONENTIAL 

C  DFLOAT  -  DOUBLE  PRECISION  FROM  INTEGER 

C  FLOAT   -  REAL  FROM  liOTEGER 

C  SN6L    -  REAL  FROM  DOUBLE  PRECISION 

C 

c 

C  ASSUMED  SUPPORT  ROUTINES 

C 

C  CALLPE  -  INVOKE  A  PARALLEL  PROCESS,  RETURN  .FALSE.  IF 

C  NO  PROCESSOR  AVAILABLE  IMMEDIATELY 

C  DONE    -  A  ROUTINE  TO  CALL  WHEN  ALL  CALCULATIONS  ARE  DONE 

C         IDLEPE  -  A  ROUTINE  TO  CALL  WHEN  THIS  PROCESSOR  HAS  NO  MORE  WORK 

C  PURGE   -  A  ROUTINE  TO  FORCE  A  SYNCHRONOUS  VARIABLE  EMPTY 

C  MUCH  OF  THE  STRUCTURE  OF  THIS  ROUTINE  IS  DERIVED  FROM  THE 

C  ALGOL  PROCEDURE  "BISECT",  BY  BARTH,  MARTIN  AND  WILKINSON, 

C         NUMER.  MATH.  9  (1967)  386-393. 

C 

C         MOST  NON- INTEGER  VARIABLES  MUST  BE  DOUBLE  PRECISION. 

C 

IMPLICIT  DOUBLE  PRECISION  (A-H,0-Z) 
C 

C         HOWEVER,  THE  FOLLOWING  MAY  BE  HANDLED  AT  LOWER  PRECISION 
C         WITHOUT  SERIOUS  EFFECT. 
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C 

RSAL  FX1 ,FXO,PXU, DELTA, XWIDTH.XLIM, ALPHA, EPSLOG.DIFF 

REAL  XNEAR,RELRCF 

INTEGER  Z 

INTEGER  ASYN 

LOGICAL  OUTER, CALLPE,ZBUSY 

EXTERNAL  RBISECT 

COMMON  /CBISECT/  RELRCP,EPSLOG,XMIN,XMAX,BDHI,BDHRCP,XLBD 

C 

C  THE  ARRAYS  ARE  AS  FOLLOWS: 


C 


c 


DIMENSION  GAMMA(N) ,BETA(N) ,BETASQ(N) ,X(M2) ,ESTL0(M2} 


ZBUSY  =  .FALSE. 
C 

C  COMPUTE  THE  EIGENVALUE  BOUNDS  XMIN,  XMAX 

C  AND  FORCE  THE  ASYNCHRONOUS  VARIABLES  TO  BE  EMPTY 

C  BUT  ONLY  ON  THE  OUTER  INVOCATION  OF  BISECT. 

C  SKIP  THIS  CODE  ON  ALL  INNER  INVOCATIONS. 


IF  (.NOT. OUTER)  GO  TO  1500 

CALL  PURGE (ASYN) 

CALL  PURGE(Z) 

ASYN=0 

Z=0 

BETA(1)  =  0. 

BETASQ(1)  =  0. 

XMIN = GAMMA  ( N  ) -DABS  ( BETA  ( N  )  ) 

XMAX=GAMMA(N)-t-DABS(B£TA(N) ) 

DO  1000  11=2, N 

I=N+1-II 

H=DABS(BETA(I)  )-«-DABS(B£TA(I-«-1 )  ) 

IP(GAMMA(I)+H.GT.XMAX)  XMAX=GAMMA(  I)+H 
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IF(GAMMA(I)-H.LT.XMIN)  XMIN=GAMMA(I)-H 
1000  CONTINUE 
C 

C  COMPUTE  THE  TOLERANCES 

C 

RELRCP=  1  .DO/EPSMAC 

EPSLOG=ALOG(SNGL(EPSMAC)  ) 

IF  ( XMIN+XMAX .  GT .  0  .  )  EPS2= EPSMAC*XMAX 

IF  ( XMIN+XMAX .  LE .  0  .  )  EPS2  =  EPSMAC*  ( -XMIN ) 

IF(EPS1.LE.O.)  EPS1  =  EPS2 

EPS2=  0 .  5D0*EPS1  +7 .  D0*EPS2 
C 

C  DERIVE  THE  PARAMETERS  FOR  RESCALING  FROM  EPSMAC 

C 

LOSCAL=  EPSLOG/2 . 

BDHI  =  DEXP ( DFLOAT ( - 2 *LOSCAL)  ) 

XLBD=  DEXP  ( DFLOAT  ( LOSCAL )  ) 

BDHRCP=  1  .DO /BDHI 
C 

C  INNER  BLOCK 

C 

XO=XMAX 

DO  2000  I=M1,M2 

X(I)=XMAX 

ESTLO(I)=XMIN 
2000  CONTINUE 
1500  CONTINUE 
C 

MM1  =  M1 

MM2  =  M2 

K=M2+1 
2010  K=K-1 

IF  (K.LT.MM1)  GO  TO  4000 
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C  LOOK  FOR  THE  K-TH   EIGENVALUE 

C 

XU=XMIN 
C 

C  FIND  INITIAL  TEST  INTERVAL  BOUNDS 

C 

DO  2020  II=MM1,K 
I=MM1+K-II 

IF(XU.GE.ESTLO(I))  GO  TO  2020 
XU=ESTLO(I) 
GO  TO  2030 
2020  CONTINUE 

2030  IF(XO.GT.X(K))  XO=X(K) 
C 

FXU=  -RELRCP 
FXO=  -RELRCP 
ILO=0 
IHI=N 

WIDTH= RELRCP 
XNEAR=0. 
C 

C  LOOP  TO  DIVIDE  THE  INTERVAL 

C 
3000  XLIM=2.DO»EPSMAC*(DABS(XU)+DABS(XO))+EPS1 
XWIDTH=XO-XU 

IF(XWIDTH.LE.XLIM)  GO  TO  3900 
X1=(XU+XO)/2.D0 
XNEAR=  XNEAR/2 .  EO 

IF (XWIDTH*1.333E0.GT. WIDTH)  XNEAR=.25E0 
DELTA=  2 .  EO  *XLIM/XWIDTH 
IF(DELTA.GT.2.5E-1)  GO  TO  3005 
DELTA= AMAX 1  ( DELTA ,  XNEAR ) 
DIFF=AMAX1 (EPSLOG.AMINI ( -EPSLOG, (FXO-FXU)/FLOAT( IHI-ILO) ) ) 
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ALPHA=  ( 1 . -DELTA )♦(( EXP ( DIFF ) +DELTA/ ( 1 . -DELTA ))/( EXP ( DIFF )  + 1 .) ) 
X1  =  DBL£( ALPHA  }»XU+(1 . DO -DBLE (ALPHA) )«X0 
3005  CONTINUE 

WIDTH=XWIDTH 
Z=Z+1 

c 

C  COMPUTE  DETERMINANT  BY  MINORS 

C 

IA=0 

IEXP=0 

FN=1.D0 

P=0.D0 
C 

DO  3100  1=1, N 

FP=F*BETASQ(I) 

F=FN 

FN=  (GAMMA( I ) -XI ) »F-FP 

IF(FN.6E.0.D0)  60  TO  3010 

IA=IA+1 

FN=-FN 

F=-F 
3010  IF  (FN.GT.XLBD)  GO  TO  3020 
C 

C  DETERMINANT  TOO  SMALL,  RESCALE  UPWARDS 

C 

FN=FN*BDHI 

F=F»BDHI 

IEXP=  IEXP+2»L0SCAL 

IF(FN.EQ.O.DO)    FN=EPSMAC*F 

GO  TO  3010 
3020  CONTINUE 
C 
C  DETERMINANT  TOO  LARGE,  RESCALE  DOWNWARDS 
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C 

IF(FN.LE.BDHI)  GO  TO  3100 
FN=FN»BDHRCP 
F=F»BDHRCP 
IEXP=  IEXP-2»L0SCAL 
3100  CONTINUE 
C 

C  DETERMINANT  AND  EIGENVALUE  COUNT  FOUND,  UPDATE  PARAMETERS 

C 

FX1  =  DMAX1  ( DABS  ( FN )  ,  EPSMAC ) 
FX 1  =  ALOG  ( FX 1 ) +PLOAT  ( lEXP ) 
IF(IA.GE.K)  GO  TO  3200 
C 

C  INTERPOLATION  POINT  BECOMES  NEW  LOWER  BOUND 

C 

XU=X1 
FXU=FX1 
ILO=IA 

IF(IA.LT.MMI)  ESTLO(MMl)=X1 
IF(IA.LT.MMI)  GO  TO  3150 
C 

C  THERE  ARE  EIGENVALUES  IN  THE  INTERVAL  BEING  DISCARDED 

C  ATTEMPT  TO  INVOKE  A  NEW  COPY  OF  BISECT  VIA  THE  DUMMY 

C  ROUTINE  RBISECT. 


C 


ESTL0(IA+1)=X1 

IF(X(IA).GT.X1)  X(IA)=X1 

ZBUSY  =  .NOT.  (CALLPE (RBISECT, GAMMA, BETA, BETASQ.N, MM 1,IA, EPS 1, 
*  EPSMAC, EPS2,Z,X,ESTL0, . FALSE ., ASYN ) ) 

IF  (.NOT. ZBUSY)  GO  TO  3160 
3150  CONTINUE 

GO  TO  3000 
3160  MM1  =  IA+1 
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ASYN=ASYN+1 

GO  TO  3000 
C 
C  INTERPOLATION  POINT  BECOMES  NEW  UPPER  BOUND 

C 
3200  X0=X1 

FX0=PX1 

IHI=IA 

GO  TO  3000 
3900  X(K)  =  (XO+XU)/2.D0 

GO  TO  2010 
C 

C  HERE  TO  RETURN,  OR  LOOP  FOR  DAUGHTERS  IP  OUTER  CALL 

C 
4000  ITEMP=ASYN-1 

ASYN=ITEMP 

IP(ITEMP.LE.-I)  CALL  DONE 

CALL  IDLEPE 

RETURN 

END 

SUBROUTINE  RBISECT(GAMMA,BETA,BETASQ,N,M1 ,M2,EPS1 ,EPSMAC, 
»  EPS2,Z,X,ESTL0, OUTER, ASYN) 
C 

C         SINCE  THE  RINSE  SIMULATION  COPIES  STORAGE,  AND  THE 
C         CYBER  PASSES  ARGUMENTS  BY  ADDRESS,  NOT  VALUE,  THE 
C         VARIABLES  M1  AND  M2  WHICH  ARE  NEEDED  BOTH  AS  INPUT 
C         TO  ANY  COPY  OF  BISECT,  AND  AS  OUTPUT  TO  THE  NEXT 
C         NESTED  INVOCATION,  MUST  HAVE  TWO  DISTINCT  INCARNATIONS. 
C         THE  VARIABLES  MM1  AND  MM2  PROVIDE  THE  NECESSARY  ALTERNATE 
C         INCARNATIONS .   IF  WE  ATTEMPTED  TO  DO  THIS  WITHIN  BISECT 
C         DIRECTLY,  WE  WOULD  END  UP  WITH  A  SINGLE  INCARNATION  ON 
C         A  LOWER  LEVEL.   THUS  WE  HAVE  THIS  MINIMAL 
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C  INTERFACE  ROUTINE. 

C 

C  A  SIMILAR  TECHNIQUE  IS  NEEDED  FOR  GENERAL  DIVIDE  AND 

C  CONQUER  CODES  WITH  RINSE. 


C 


IMPLICIT  DOUBLE  PRECISION  (A-H,0-Z) 
INTEGER  Z.ASYN 
LOGICAL  OUTER 

DIMENSION  GAMMA(N) ,BETA(N) ,BETASQ(N} ,X(M2) ,ESTL0(M2) 
MM1=M1 
MM2  =  M2 

CALL  BISECT ( GAMMA , BETA , BETASQ , N , MM1 , MM2 , EPS 1 , EPSMAC , 
»  EPS2,Z,X,ESTL0, OUTER, ASYN) 
RETURN 
END 

DOUBLE  PRECISION  FUNCTION  DFLOAT(I) 
DFLOAT=  I 
RETURN 
END 

DOUBLE  PRECISION  FUNCTION  DBLE{X) 
REAL  X 
DBLE=X 
RETURN 
END 
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